This example highlights my data wrangling and visualization skills in an exploratory analysis of racial/ethnic inequity, poverty, and academic performance in public schools in Texas. This was a pet project that I created to satisfy my own curiosity on the topic and to get to know the publicly available data related to education in Texas. This page provides the steps required to go from several raw, public data sets to produce a series of interactive data visualizations that explore relationships between race/ethnicity, poverty, and educational outcomes (like these), as well as how those relationships are moderated by regional private school attendance.
You can toggle the “code” button to the right to see this chunk. You can hide long code chunks by toggling the same button.
library(tidyverse)
library(readxl)
library(plotly)
library(ggpubr)
library(gapminder)
library(kableExtra)
library(summarytools)For educational outcomes, I use the publicly available Texas
Academic Performance Report (TAPR) data for the year 2019. This is a
very large data set that that I have subsetted to focus on the
percentage of students in three racial/ethnic categories (variable:
Student Group) who meet the expectations of their
respective grade levels (variable: meets_level_pct). I also
include the numeric campus identifier cid and the
respective denominators for each percentile value (variable:
meets_denom), which informs the absolute number of students
from each category at the school. I use this value to weight the
observations in the Loess smoothing function and to inform the size
parameter in the plots. For the original variable names, see the TAPR
Codebook.
As described in the codebook for the data set, student groups’ percentile values with very small denominators are masked (coded as negative placeholder values of “-1”) in order to preclude the possibility of identification of individuals in the data. I exclude these masked values from the data set, as well as any school that reports missing data for all three student groups. Denominators of the second smallest student group for each school are also masked with a unique placeholder (“-3”) when the smallest group requires masking. I replace this set of masked values with a reasonable proxy of half the size of the next largest group in the data.
This code does the following:
meets_level_pct of students in each racial/ethnic category
that meet the grade-level expectations.Student Group
and group_denomln_meets_denom of logged
group size for use as weight in the Loess smoothing algorithm.# Load TAPR data; rename vars
campus_2019 <- read.csv("docs/TAPR_2019_subset.csv") %>%
transmute(
cid = CAMPUS,
`African American` = CDB00A001219R,
`Hispanic` = CDH00A001219R,
`White` = CDW00A001219R,
aa_total = CDB00A001019D,
h_total = CDH00A001019D,
w_total = CDW00A001019D) %>%
# Filter out schools with no data (~ 10% of campuses)
filter(!is.na(`African American`) | !is.na(`Hispanic`) | !is.na(`White`))
# Pivot values and descriptive categories
data <- cbind(
pivot_longer(campus_2019 %>% select(1:4), cols = 2:4, names_to = "Student Group", values_to = "Meets Grade Level (%)"),
pivot_longer(campus_2019 %>% select(5:7), cols = 1:3, names_to = "group_denom", values_to = "Size of Student Group at School")) %>%
group_by(cid) %>%
mutate(
`Meets Grade Level (%)` = ifelse(`Meets Grade Level (%)` == -1, NA, `Meets Grade Level (%)`), # rate = -1 is masked data and unusable
`Size of Student Group at School` = ifelse(`Size of Student Group at School` == -1, NA, # denominator = -1 is masked data and unusable
`Size of Student Group at School`),
`Size of Student Group at School` = ifelse(`Size of Student Group at School` == -3, # denominator = -3 is second smallest group
nth(`Size of Student Group at School`, 2, order_by = `Size of Student Group at School`) / 2,
`Size of Student Group at School`), # recover with reasonable proxy of half largest group
ln_meets_denom = log(`Size of Student Group at School`)) %>% # log will be better for size parameter
ungroup()
# Show Data
data[1:9,] %>%
kbl(caption = "Structure of the Data Set: Showing the First Three Schools") %>%
kable_paper("hover", full_width = F)| cid | Student Group | Meets Grade Level (%) | group_denom | Size of Student Group at School | ln_meets_denom |
|---|---|---|---|---|---|
| 1902001 | African American | 71 | aa_total | 7 | 1.945910 |
| 1902001 | Hispanic | 58 | h_total | 19 | 2.944439 |
| 1902001 | White | 67 | w_total | 180 | 5.192957 |
| 1902041 | African American | 0 | aa_total | 5 | 1.609438 |
| 1902041 | Hispanic | 48 | h_total | 31 | 3.433987 |
| 1902041 | White | 60 | w_total | 282 | 5.641907 |
| 1902103 | African American | 50 | aa_total | 6 | 1.791759 |
| 1902103 | Hispanic | 75 | h_total | 12 | 2.484907 |
| 1902103 | White | 62 | w_total | 330 | 5.799093 |
For campus-level poverty statistics, I use the public data available at the Texas Education Agency’s (TEA) school data visualizer, which includes an extensive set of campus-level attributes. I am primarily interested in the percentage of students from economically disadvantaged backgrounds, measured as the percentage of students “…Students eligible for free or reduced-price lunch or other public assistance as reported on the PEIMS October snapshot” (see definitions). Unfortunately, the source does not reveal precisely which year this data comes from! School-level poverty is a relatively stable attribute, though, so this is not a huge concern.
The challenge is that this data does not include the campus ID as a
stand-alone variable. I must first extract it from a longer string to
create a comparable cid variable that can be used to join
this data set to the first one. Because there is no universal pattern to
parse this string, I have to split on three different patterns and
extract the rightmost element in order to recover all the campus
identifiers. This is because some schools have more than one set of the
” (” pattern on the right-hand side of the “||” pattern. Each
school <- read_excel("docs/school.xlsx")
school <- school %>%
transmute(
Campus = `Campus or District`,
`Eco. Disadvantaged (%)` = `% Eco Disadvantaged`,
`Campus Type` = factor(`Entity Type`),
enrollment_public = Enrollment) %>%
mutate(
n = str_split(Campus, " \\|\\|", simplify = TRUE)[,2], # First split on "||"
nn = str_split(n, " \\(", simplify = TRUE)[,2], # Next split on " ("
nnn = str_split(n, " \\(", simplify = TRUE)[,3], # A few have an additional " ("
cid = as.numeric(gsub("\\D+", "", nn)),
cid = ifelse(is.na(cid), as.numeric(gsub("\\D+", "", nnn)), cid)) %>%
select(-starts_with("n"))
school[1:5,] %>%
kbl(caption = "Structure of the Poverty Data after Recovering the Campus Identifier") %>%
kable_paper("hover", full_width = F)| Campus | Eco. Disadvantaged (%) | Campus Type | enrollment_public | cid |
|---|---|---|---|---|
| CAYUGA H S || CAYUGA ISD (001902001) | 34.9 | High School | 166 | 1902001 |
| CAYUGA MIDDLE || CAYUGA ISD (001902041) | 40.6 | Middle & Jr. High School | 133 | 1902041 |
| CAYUGA EL || CAYUGA ISD (001902103) | 38.1 | Elementary School | 236 | 1902103 |
| ELKHART H S || ELKHART ISD (001903001) | 47.4 | High School | 344 | 1903001 |
| ELKHART MIDDLE || ELKHART ISD (001903041) | 45.1 | Middle & Jr. High School | 268 | 1903041 |
This data set from TEA includes the codes that can be used for geographic identification in the plots.
codes <- read_csv("docs/school and district.csv")
codes <- codes %>%
transmute(
cid = as.numeric(gsub("[^[:alnum:] ]", "", `School Number`)),
district = as.numeric(gsub("[^[:alnum:] ]", "", `District Number`)),
county = as.numeric(gsub("[^[:alnum:] ]", "", `County Number`)),
region = as.numeric(gsub("[^[:alnum:] ]", "", `ESC Region Served`)))Taking the TAPR 2019 data as the base set, I match approximately
98.3% of campuses with the TEA school-level data by their campus id
(cid). For parsimony, I exclude those %1.7% campuses that
do not exist in both data sets. We can see that the masking and missing
data results in approximately 10% of missing data from the TAPR 2019
data; however, due to the logic of how the masking is coded, those
masked observations are likely to be smaller, relatively more
homogeneous schools, otherwise, they would not have very small absolute
numbers of students from any of the three racial/ethnic backgrounds.
#campus <- left_join(data, school, by = "cid") %>% left_join(codes, by = "cid") # not as complete
campus <- left_join(school, codes, by = "cid") %>% right_join(data, by = "cid") %>%
filter(!is.na(Campus))
print(
dfSummary(campus,
plain.ascii = FALSE,
style = "grid",
graph.magnif = 0.75,
valid.col = FALSE,
tmp.img.dir = "/tmp",
silent = TRUE),
method = "render")| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Campus [character] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2 | Eco. Disadvantaged (%) [numeric] |
|
982 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3 | Campus Type [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 4 | enrollment_public [numeric] |
|
1728 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 5 | cid [numeric] |
|
7886 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 6 | district [numeric] |
|
1193 distinct values | 144 (0.6%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 7 | county [numeric] |
|
253 distinct values | 144 (0.6%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 8 | region [numeric] |
|
20 distinct values | 144 (0.6%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9 | Student Group [character] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 10 | Meets Grade Level (%) [integer] |
|
100 distinct values | 2392 (10.1%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 11 | group_denom [character] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 12 | Size of Student Group at School [numeric] |
|
2329 distinct values | 2539 (10.7%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 13 | ln_meets_denom [numeric] |
|
2329 distinct values | 2539 (10.7%) |
Generated by summarytools 1.0.1 (R version 4.0.3)
2022-07-26
Here I write a function to build plotly interactive
visualizations. Readers can hover over the image to get point-level
data, as well as to zoom in to a plot region and toggle a
Student Group on/off in the point/color legend.
School-level data is plotted on the x-axis based on poverty levels
Eco. Disadvanted (%). School-level data disaggregated by
race/ethnicity on the y-axis to show each racial/ethnic group’s academic
performance as Meets Grade Level (%). The three race/ethnic
groups are connected by vertical lines, showing the distance or spread
(y-axis) in performance among the groups within each campus. The size of
each point on the plot is mapped by the absolute number of students in
each race/ethnicity category within that school. The lines are generated
through Loess smoothing algorithms that display (non-linear) trends
across the x-axis.
I write a custom function that takes the data and two filtering
parameters to slice the full data set:
campus %>% filter('Campus Type' == {{type}} & region == {{r}}).
I pass this function to lapply() to generate plots for each
region and Campus Type (excluding K-12 schools
for parsimony).
# Regions 1 to 20
r <- 1:20
# Blank list for full sample objects
fs <- list()
# Create function to generate 20 plots
plots <- function(data, r, type) {
t <- ggplot(data = campus %>% filter(`Campus Type` == {{type}} & region == {{r}}),
aes(label = `Campus`, label2 = `Size of Student Group at School`, x = `Eco. Disadvantaged (%)`, y = `Meets Grade Level (%)`)) +
geom_line(alpha = .3, aes(group = cid)) +
geom_point(alpha = .4, aes(size = `Size of Student Group at School`, color = `Student Group`)) +
geom_smooth(size = 2, method = loess, color = "black", se = FALSE, aes(group = `Student Group`, weight = ln_meets_denom)) +
geom_smooth(size = 1.8, method = loess, se = FALSE, aes(color = `Student Group`, weight = ln_meets_denom)) +
guides(size = FALSE) +
theme(legend.position = "bottom") +
xlim(c(0,100)) + ylim(c(0,100)) +
ggtitle(paste("Region", {{r}}, {{type}}, "Performance by Race/Ethnicity and Poverty", sep = " "))
t[[r]] <- ggplotly(t, tooltip = c("label", "label2", "x", "y"))
#print(t[[r]])
#return(t[[r]])
}
# Stores high school plots for 20 regions in indexed list
gg_hs <- lapply(r, plots, data = campus, type = "High School")
# Stores high school plots for 20 regions in indexed list
gg_jhs <- lapply(r, plots, data = campus, type = "Middle & Jr. High School")
# Stores elementary school plots for 20 regions in indexed list
gg_es <- lapply(r, plots, data = campus, type = "Elementary School")
# Full sample: High school
t <- ggplot(data = campus %>% filter(`Campus Type` == "High School"),
aes(label = `Campus`, label2 = `Size of Student Group at School`, x = `Eco. Disadvantaged (%)`, y = `Meets Grade Level (%)`)) +
geom_line(alpha = .3, aes(group = cid)) +
geom_point(alpha = .4, aes(size = `Size of Student Group at School`, color = `Student Group`)) +
geom_smooth(size = 2, method = loess, color = "black", se = FALSE, aes(group = `Student Group`, weight = ln_meets_denom)) +
geom_smooth(size = 1.8, method = loess, se = FALSE, aes(color = `Student Group`, weight = ln_meets_denom)) +
guides(size = FALSE) +
theme(legend.position = "bottom") +
xlim(c(0,100)) + ylim(c(0,100)) +
ggtitle("Full Sample of High Schools: Performance by Race/Ethnicity and Poverty")
fs[[1]] <- ggplotly(t, tooltip = c("label", "label2", "x", "y"))
# Full sample: Jr. High school
t <- ggplot(data = campus %>% filter(`Campus Type` == "Middle & Jr. High School"),
aes(label = `Campus`, label2 = `Size of Student Group at School`, x = `Eco. Disadvantaged (%)`, y = `Meets Grade Level (%)`)) +
geom_line(alpha = .3, aes(group = cid)) +
geom_point(alpha = .4, aes(size = `Size of Student Group at School`, color = `Student Group`)) +
geom_smooth(size = 2, method = loess, color = "black", se = FALSE, aes(group = `Student Group`, weight = ln_meets_denom)) +
geom_smooth(size = 1.8, method = loess, se = FALSE, aes(color = `Student Group`, weight = ln_meets_denom)) +
guides(size = FALSE) +
theme(legend.position = "bottom") +
xlim(c(0,100)) + ylim(c(0,100)) +
ggtitle("Full Sample of Middle Schools: Performance by Race/Ethnicity and Poverty")
fs[[2]] <- ggplotly(t, tooltip = c("label", "label2", "x", "y"))
# Full sample: Elementary school
t <- ggplot(data = campus %>% filter(`Campus Type` == "Elementary School"),
aes(label = `Campus`, label2 = `Size of Student Group at School`, x = `Eco. Disadvantaged (%)`, y = `Meets Grade Level (%)`)) +
geom_line(alpha = .3, aes(group = cid)) +
geom_point(alpha = .4, aes(size = `Size of Student Group at School`, color = `Student Group`)) +
geom_smooth(size = 2, method = loess, color = "black", se = FALSE, aes(group = `Student Group`, weight = ln_meets_denom)) +
geom_smooth(size = 1.8, method = loess, se = FALSE, aes(color = `Student Group`, weight = ln_meets_denom)) +
guides(size = FALSE) +
theme(legend.position = "bottom") +
xlim(c(0,100)) + ylim(c(0,100)) +
ggtitle("Full Sample of Elementary Schools: Performance by Race/Ethnicity and Poverty")
fs[[3]] <- ggplotly(t, tooltip = c("label", "label2", "x", "y"))The functions have stored three sets of 20 plots within indexed lists - one plot for each Texas Education Service Center Region at each campus level. You can view all 60 plots here. You can see a map of the Texas ESC Regions here.
Below are the plots for the largest (Region 4: Houston area) and a mid-sized (Region 19: El Paso area) subset. These exploratory plots demonstrate the strong relationship between poverty, race/ethnicity, and educational outcomes. Between schools, and generally within schools, white students tend to have higher rates of meeting grade-level expectations. The full sample plots reveal that race has a persistent difference level (or growth intercept) regardless of the age group of the students. However the trend across poverty levels (or (negative) growth slope) is increasingly non-linear as students age into high school. This is evident in the increasingly steep Loess curves on the extremities of the x-axis for high school campuses, compared with other campuses, as well as junior high campuses compared with elementary campuses. This analysis is exploratory and does not make causal claims. It is important to remember that I have not made sufficient efforts at eliminating endogeneity to make causal claims. One clear takeaway, though, is that further work would need to recognize this non-linearity in any analysis using this data. Otherwise, relationships among these factors might be overlooked.
# Houston: Region 4
htmltools::tagList(list(gg_hs[[4]], gg_jhs[[4]], gg_es[[4]]))# El Paso: Region 19
htmltools::tagList(list(gg_hs[[19]], gg_jhs[[19]], gg_es[[19]]))# Full Samples
htmltools::tagList(list(fs[[1]], fs[[2]], fs[[3]]))How does private school attendance moderate the relationship between race/ethnicity, poverty levels, and educational outcomes? I downloaded the 2019 private school enrollment data from the Texas Private School Accreditation Commission site. I grouped the data by region) and combined it with the public school-level data from the Texas Education Agency’s data used above. I want to explore whether different patterns emerge in these relationships when the data is subset to include small regions with low and high rates of private student attendance.
# Load 2019 private school data and Filter out pre-K and closed schools
private <- read_excel("docs/private schools 2018-2019.xlsx")
private <-private %>%
filter(!`Grade High` %in% c("Pre-K", "Early Education") & Closed == FALSE) %>%
transmute(
district = as.numeric(`District Number`),
county = as.numeric(`County Number`),
region = as.numeric(`Region Name`),
enrollment_private = as.numeric(Enrollment))
# Aggregate enrollment by region
private_region <- private %>%
group_by(region) %>%
summarise(
private_enrollment_region = sum(enrollment_private, na.rm = TRUE)) %>%
ungroup()
# Aggregate public school enrollment by region
public_region <- school %>%
left_join(codes, by = "cid") %>%
group_by(region) %>%
summarise(
public_enrollment_region = sum(enrollment_public, na.rm = TRUE)) %>%
ungroup()
#Join and create variables and factor labels
region <- left_join(private_region, public_region, by = "region")
region <- region %>%
mutate(
`Total Enrollment` = private_enrollment_region + public_enrollment_region,
`Private Students in Region (%)` = 100 * ( private_enrollment_region / `Total Enrollment`),
Region_lab = paste("Region", region, sep = " "),
Region = factor(region, ordered = TRUE, labels = Region_lab),
`Private School Rates` = ifelse(
region %in% c(3, 18, 2, 19, 6), "High and Small Pop.", ifelse(
region %in% c(12, 15, 14, 8, 16), "Low and Small Pop.", "Middle or Large Pop."))) %>%
arrange(`Private Students in Region (%)`) %>%
mutate(
Region_lab = paste("Region", region, sep = " "),
`Region Ordered` = factor(region, levels = region,
labels = paste("Region", region, sep = " ")))In order to make the comparison more reasonable, I focus on ESC
regions with similar Total Enrollment that have low and
high rates of private school attendance in the region. Thus, the groups
of schools I compare later come from the red regions (small regions with
rates of private school attendance) and the blue regions (small regions
with low rates of private school attendance). I should note that this
effort to choose comparable groups of regions does not sufficiently
eliminate endogeneity to consider any outcomes causal. This is an
exploratory analysis.
# Scatter and bar plots
s <- ggplot(region,
aes(x = log(`Total Enrollment`),
y = `Private Students in Region (%)`,
label = region,
color = `Private School Rates`)) +
geom_text(size = 5) +
guides(color = FALSE)
b <- ggplot(region,
aes(x = `Private Students in Region (%)`,
y = `Region Ordered`,
fill = `Private School Rates`)) +
geom_col() +
theme(legend.position = "bottom")
ggarrange(s, b, nrow = 1, common.legend = TRUE, legend = "bottom")
The most notable difference here is that extremely privileged or disadvantaged schools have more extreme values in regions where there are high rates of private school attendance. We can further examine this by directly comparing racial/ethnic groups.
# Low and high private school attendance
ps <- list()
t <- ggplot(data = campus %>% filter(`Campus Type` == "High School" & region %in% c(12, 15, 14, 8, 16)),
aes(label = `Campus`, label2 = `Size of Student Group at School`, x = `Eco. Disadvantaged (%)`, y = `Meets Grade Level (%)`)) +
geom_line(alpha = .3, aes(group = cid)) +
geom_point(alpha = .4, aes(size = `Size of Student Group at School`, color = `Student Group`)) +
geom_smooth(size = 2, method = loess, color = "black", se = FALSE, aes(group = `Student Group`, weight = ln_meets_denom)) +
geom_smooth(size = 1.8, method = loess, se = FALSE, aes(color = `Student Group`, weight = ln_meets_denom)) +
guides(size = FALSE) +
theme(legend.position = "bottom") +
xlim(c(0,100)) + ylim(c(0,100)) +
ggtitle("High Schools: Small Regions with Lower Rates of Private School Attendance")
ps[[1]] <- ggplotly(t, tooltip = c("label", "label2", "x", "y"))
t <- ggplot(data = campus %>% filter(`Campus Type` == "High School" & region %in% c(3, 18, 2, 19, 6)),
aes(label = `Campus`, label2 = `Size of Student Group at School`, x = `Eco. Disadvantaged (%)`, y = `Meets Grade Level (%)`)) +
geom_line(alpha = .3, aes(group = cid)) +
geom_point(alpha = .4, aes(size = `Size of Student Group at School`, color = `Student Group`)) +
geom_smooth(size = 2, method = loess, color = "black", se = FALSE, aes(group = `Student Group`, weight = ln_meets_denom)) +
geom_smooth(size = 1.8, method = loess, se = FALSE, aes(color = `Student Group`, weight = ln_meets_denom)) +
guides(size = FALSE) +
theme(legend.position = "bottom") +
xlim(c(0,100)) + ylim(c(0,100)) +
ggtitle("High Schools: Small Regions with Higher Rates of Private School Attendance")
ps[[2]] <- ggplotly(t, tooltip = c("label", "label2", "x", "y"))
htmltools::tagList(list(ps[[1]], ps[[2]]))The final exploratory analysis takes the same data and slices it into the three racial/ethnic groups for the higher and lower rates of private school attendance in region. Schools in regions with lower private school attendance rates do not tend to have greater overall rates of meeting their grade levels. One interesting pattern in the data is that schools in regions with lower rates of private school attendance have lower within-region inequality among schools. That is to say that extremely privileged (disadvantaged) campuses in regions with many private schools perform far better (worse) than other campuses within their own regions. Privileged (disadvantaged) campuses in regions with lower rates of private schools still perform better (worse); however, these differences within those regions are less extreme.
campus <- campus %>%
mutate(
`Private School Rates` = ifelse(
region %in% c(3, 18, 2, 19, 6), "High and Small Pop.", ifelse(
region %in% c(12, 15, 14, 8, 16), "Low and Small Pop.", "Middle or Large Pop.")))
t <- ggplot(data = campus %>% filter(`Campus Type` == "High School" & region %in% c(12, 15, 14, 8, 16, 3, 18, 2, 19, 6) & `Student Group` == "African American"),
aes(label = `Campus`, label2 = `Size of Student Group at School`, x = `Eco. Disadvantaged (%)`, y = `Meets Grade Level (%)`)) +
#geom_line(alpha = .3, aes(group = cid)) +
geom_point(alpha = .4, aes(size = `Size of Student Group at School`, color = `Private School Rates`)) +
geom_smooth(size = 2, method = loess, color = "black", se = FALSE, aes(group = `Private School Rates`, weight = ln_meets_denom)) +
geom_smooth(size = 1.8, method = loess, se = FALSE, aes(color = `Private School Rates`, weight = ln_meets_denom)) +
guides(size = FALSE) +
theme(legend.position = "bottom") +
xlim(c(0,100)) + ylim(c(0,100)) +
ggtitle("African American Students: High Schools in Small Regions with Lower Rates of Private School Attendance")
ps[[3]] <- ggplotly(t, tooltip = c("label", "label2", "x", "y"))
t <- ggplot(data = campus %>% filter(`Campus Type` == "High School" & region %in% c(12, 15, 14, 8, 16, 3, 18, 2, 19, 6) & `Student Group` == "Hispanic"),
aes(label = `Campus`, label2 = `Size of Student Group at School`, x = `Eco. Disadvantaged (%)`, y = `Meets Grade Level (%)`)) +
#geom_line(alpha = .3, aes(group = cid)) +
geom_point(alpha = .4, aes(size = `Size of Student Group at School`, color = `Private School Rates`)) +
geom_smooth(size = 2, method = loess, color = "black", se = FALSE, aes(group = `Private School Rates`, weight = ln_meets_denom)) +
geom_smooth(size = 1.8, method = loess, se = FALSE, aes(color = `Private School Rates`, weight = ln_meets_denom)) +
guides(size = FALSE) +
theme(legend.position = "bottom") +
xlim(c(0,100)) + ylim(c(0,100)) +
ggtitle("Hispanic Students: High Schools in Small Regions with Lower Rates of Private School Attendance")
ps[[4]] <- ggplotly(t, tooltip = c("label", "label2", "x", "y"))
t <- ggplot(data = campus %>% filter(`Campus Type` == "High School" & region %in% c(12, 15, 14, 8, 16, 3, 18, 2, 19, 6) & `Student Group` == "White"),
aes(label = `Campus`, label2 = `Size of Student Group at School`, x = `Eco. Disadvantaged (%)`, y = `Meets Grade Level (%)`)) +
#geom_line(alpha = .3, aes(group = cid)) +
geom_point(alpha = .4, aes(size = `Size of Student Group at School`, color = `Private School Rates`)) +
geom_smooth(size = 2, method = loess, color = "black", se = FALSE, aes(group = `Private School Rates`, weight = ln_meets_denom)) +
geom_smooth(size = 1.8, method = loess, se = FALSE, aes(color = `Private School Rates`, weight = ln_meets_denom)) +
guides(size = FALSE) +
theme(legend.position = "bottom") +
xlim(c(0,100)) + ylim(c(0,100)) +
ggtitle("White Students: High Schools in Small Regions with Lower Rates of Private School Attendance")
ps[[5]] <- ggplotly(t, tooltip = c("label", "label2", "x", "y"))
htmltools::tagList(list(ps[[3]], ps[[4]], ps[[5]]))